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Abstract: The precise knowledge of the atmospheric neutrino fluxes is a key ingredient in 
the interpretation of the results from any atmospheric neutrino experiment. In the standard 
data analysis, these fluxes are theoretical inputs obtained from sophisticated numerical 
calculations based on the convolution of the primary cosmic ray spectrum with the expected 
yield of neutrinos per incident cosmic ray. In this work we present an alternative approach 
to the determination of the atmospheric neutrino fluxes based on the direct extraction 
from the experimental data on neutrino event rates. The extraction is achieved by means 
of a combination of artificial neural networks as interpolants and Monte Carlo methods for 
faithful error estimation. 
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1. Introduction and Motivation 

One of the most important breakthroughs in particle physics, and the only solid evidence 

for physics beyond the Standard Model, is the discovery - following a variety of independent 
experiments [1] - that neutrinos are massive and consequently can oscillate among their 
different flavor eigenstates. The flavour oscillation hypothesis has been supported by an 
impressive wealth of experimental data, one of the most important pieces of evidence 
coming from atmospheric neutrinos [2—5]. 

Atmospheric neutrinos originate in the collisions of cosmic rays with air nuclei in 
the Earth's atmosphere. The collision produces mostly pions (and some kaons), which 
subsequently decay into electron and muon neutrinos and anti-ncutrinos. These neutrinos 
are observed in underground experiments using different techniques [2-5]. In particular, in 
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the last ten years high precision and large statistics data has been available from the Super- 
Kamiokande experiment [3], which has clearly established the existence of a deficit in the 
/i-like atmospheric events with the expected distance and energy dependence from 
oscillations with oscillation parameters Am^^jjj ~ 2 x 10~^ eV^ and tan^^atm = 1- This 
evidence has also been confirmed by other atmospheric experiments such as MACRO [4] 
and Soudan 2 [5]. 

The expected number of atmospheric neutrino events depends on a variety of com- 
ponents: the atmospheric neutrino fiuxes, the neutrino oscillation parameters and the 
neutrino-nucleus interaction cross section. Since the main focus of atmospheric neutrino 
data interpretation has been the determination of neutrino oscillation parameters, in the 
standard analysis the remaining components of the event rate computation arc inputs 
taken from other sources. In particular, the fluxes of atmospheric neutrinos arc taken from 
the results of numerical calculations, such as those of Refs. [6-8], which are based on the 
convolution of the primary cosmic ray spectrum with the expected yield of neutrinos per 
incident cosmic ray [9]. 

The oscillations of can also be tested in Long Baseline (LBL) experiments, using as 
neutrino source a controlled beam of accelerator neutrinos. The results of the first two of 
these LBL experiments, K2K [10] and MINOS [11], confirm, both in the observed deficit 
of events and in their energy dependence, that accelerator f^, oscillate as expected from 
oscillations with the parameters inferred from the atmospheric neutrino data. Furthermore 
they already provide a competitive independent determination of the relevant Am^, and 
with time either MINOS and/or other future LBL experiments [12, 13] will give a most 
precise determination of the mixing angle. 

The attainable accuracy in the independent determination of the relevant neutrino os- 
cillation parameters from non-atmospheric neutrino experiments [11-13] makes it possible 
to attempt an inversion of the strategy: to use the oscillation parameters (independently 
determined in non atmospheric neutrino experiments) as inputs in the atmospheric neu- 
trino analysis in order to extract the atmospheric neutrino fluxes directly from the data. 
One must notice, however, that at present this independent determination of the oscilla- 
tion parameters is still incomplete, since neither MINOS nor K2K have provided us with 
enough precision, in particular for what concerns the relevant mixing angle. Consequently 
such inversion of the strategy is still not possible without further assumptions. In that 
respect our results can be regarded as the presentation of a novel method to determine 
the atmospheric neutrino flux which, at present, can only be applied under the assumption 
that oscillation parameters will eventually be measured at LBL experiments and that they 
will have a value close to the present best fit value. 

There are several motivations for such direct determination of the atmospheric neutrino 
fluxes. First of all it would provide a cross-check of the standard flux calculations as well as 
of the size of the associated uncertainties (which, being mostly theoretical, are difficult to 
quantify). Second, a precise knowledge of the atmospheric neutrino flux is of importance 
for high energy neutrino telescopes [14], both because they are the main background and 
they are used for detector calibration. Finally, such program may quantitatively expand the 
physics potential of future atmospheric neutrino experiments [15-17]. Technically, however. 
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this program is challenged by the absence of a generic parametrization of the energy and 
angular functional dependence of the fluxes which is valid in all the range of energies where 
there is available data. 

In this work we present the results of a first step in the direction of this alterna- 
tive approach on the determination of the atmospheric neutrino fluxes: we will determine 
the energy dependence of the atmospheric neutrino fluxes from the data on atmospheric 
neutrino event rates measured by the Super-Kamiokande experiment. The present experi- 
mental accuracy of the Super-Kamiokande experiment is not enough to allow for a separate 
and precise determination of the energy, zenith angle and flavour dependence of the atmo- 
spheric flux. Consequently, in this work we will rely on the zenith and flavour dependence 
of the flux as predicted by some of the atmospheric flux calculations in Refs. [6-8], and 
discuss the estimated uncertainty associated with this choice. Furthermore, as discussed 
above, the fluxes are determined under the assumption that oscillation parameters will 
eventually be independently determined by non atmospheric neutrino experiments with a 
value close to the present best fit. 

In our determination of the energy dependence of the atmospheric neutrino fluxes, 
the problem of the unknown functional form for the neutrino flux is bypassed by the 
use of neural networks as intcrpolants. Artificial neural networks have long been used 
in different fields, from biology to high energy physics, and from pattern recognition to 
business intelligence applications. In this work we use artificial neural networks since they 
are a most unbiased prior, that is, they allow us to parametrize the atmospheric neutrino 
flux without having to assume any functional behavior. Furthermore, the determined flux 
comes together with a faithful estimation of the associated uncertainties obtained using 
the Monte Carlo method for error estimation. 

Indeed the problem of the deconvolution of the atmospheric flux from experimental data 
on event rates is rather close in spirit to the determination of parton distribution functions 
in deep-inelastic scattering from experimentally measured structure functions [18, 19]. For 
this reason, in this work we will apply to the determination of the atmospheric neutrino 
fluxes a general strategy originally designed to extract parton distributions in an unbiased 
way with faithful estimation of the uncertainties^ [20-25]. 

The outline of this paper is as follows: in Section 2 we describe our general strategy 
that we will use including the description of the experimental data used, as well as the 
Monte Carlo replica generation and the neural network training procedure. In Section 3 
we present the results for our reference fit, and in Section 4 we analyze the impact of 
various choices that define this reference. After concluding, four appendices summarize 
the most technical aspects of this work: the definition of several statistical estimators, 
an example of the Monte Carlo method for error estimation, some details of the neural 
networks employed, and a brief review of genetic algorithms. 



^This strategy has also been successfully applied with different motivations in other contexts like tau 
lepton decays [26] and B meson physics [27]. 
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2. General Strategy 



The general strategy that will be used to determine the atmospherie neutrino fluxes was 
first presented in Ref. [20] and needs only some modifications in order to be adapted to 
the problem under consideration. The path from the data to the flux parametrization 
involves two distinct stages [20,22]. In the first stage, a Monte Carlo sample of replicas of 
the experimental data on neutrino event rates ("artificial data") is generated. These can 
be viewed as a sampling of the probability measure on the space of physical obscrvables 
at the discrete points where data exist. In the second stage one uses neural networks to 
interpolate between these points. In the present case, this second stage in turn consists of 
two sub-steps: the determination of the atmospheric event rates from the atmospheric flux 
in a fast and efficient way, and the comparison of the event rates thus computed to the 
data in order to tune the best-fit form of input neural flux distribution ("training of the 
neural network"). Combining these two steps, the space of physical observablcs is mapped 
onto the space of fluxes, so the experimental information on the former can be interpolated 
by neural networks in the latter. 

Let us now describe each stage in turn for the specific case discussed in this paper. 
2.1 Experimental Data and Generation of Replicas 

In the present analysis we use the complete data on atmospheric neutrino event rates 
from the phase-I of the Super-Kamiokande experiment [3]. Data from other atmospheric 
neutrino experiments [4, 5], although important as a confirmation, are not included in 
this work because of their lower statistical significance. Higher energy data from neutrino 
telescopes like Amanda are not publicly available in a format which allows for its inclusion 
in the present analysis and its treatment is left for future work. 

The full Super-Kamiokande-I atmospheric neutrino data sample is divided in 9 different 
types of events: contained events in three energy ranges, Sub-GeV, Mid-GeV^ and Multi- 
GeV electron- and muon-like, partially contained muon-like events and upgoing stopping 

and thrugoing muon events. Each of the above types of events is divided in 10 bins in 
the final state lepton zenith angle (pi, with — 1 < cos<pi < 1 for contained and partially 
contained events and — 1 < cos0| < for stopping and thrugoing muon events. Therefore 
we have a total of A^dat = 90 experimental data points, which we label as 

^ = l,...,iVdat. (2.1) 

Note that each type of atmospheric neutrino event rate is sensitive to a different region 
of the neutrino energy spectrum. For example the expected event rate for contained events 

^In the Sub-GeV samples the lepton momentum pi satisfies pi < 400 MeV while in the Mid-GeV samples 
the lepton momentum pi satisfies pi > 400 MeV. 
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can be computed as: 



Ri = n,g,Ty^ dh dc, dE, dEi dca dipa ,„ " {E„c,) 

•''-1 JE^in JE^i^ J-1 Jo dE,dC, 

d^a"^ 

KtiE,, c„ h) P^^piE,, h I ff) -^^J-{E„ El, ca) ef^Ei,ci{c„ Ca, <Pa)) , (2.2) 

where P^^p {Pa^p) is the ~^ ^p) conversion probabihty for given values of 

the neutrino energy E,, the cosine c, of the angle between the incoming neutrino and the 
vertical direction, the production altitude /i, and the oscillation parameters fj. Here ntgt is 
the number of targets, T is the experiment running time, <I>J {^a) i^ the flux of atmospheric 
neutrinos (antincutrinos) of type a, is the altitude distribution (normalized to one) of 
the neutrino production point and (ct^) is the charged-current neutrino- (antineutrino-) 
nucleon interaction cross section. The variable Ei is the energy of the final lepton of type /3, 
while Ca and ipa parametrize the opening angle between the incoming neutrino and the final 
lepton directions as determined by the kinematics of the neutrino interaction. Finally, e^™ 
gives the probability that a charged lepton of type energy Ei and direction c; contributes 
to the given bin. 

Correspondingly for upgoing muons the expected number of events in each bin can be 
evaluated as: 



/•CO /•+! roo i-Ev t-Ef^ />+! /.27r 

Ri = Prod.Tj2 dh dc, dE, dEl dEl"" dCa difa 

a,± J — 1 "^Emin •^^'min "^^min J —1 JO 

^^-{E,, c) Kt{E,, c„ h) P^^^{E„ c, h I ff) -^^Jt_^E„ dEl, Co) 

RrockiEl EfT) A^^iE^^, Cl{c„ Ca,^a)) , (2-3) 

where Prock is the density of targets in standard rock, -Rrock is the effective muon range [28] 
for a muon which is produced with energy E^ and reaches the detector with energy E^^, 
and A]^q^ is the effective area for stopping and thrugoing muons respectively. 
Thus in general 

where ^^J' gives the contribution of the neutrino fiux of energy E, to the event rate 
Ri after weighting with the neutrino interaction cross sections, muon propagation and 
detection efficiencies. In order to illustrate which part of the atmospheric neutrino spectrum 
is mostly determined by each event type we plot in Fig. 1 the function — (normalized 
to one) for the Honda atmospheric fluxes [6], for each event type - Sub-GeV, Mid-GeV 
and Multi-GeV, partially contained, stopping and thrugoing muon events- averaged over 
zenith angle (assuming oscillations with Am^tm = 2.2 x 10^'^ cV^ and tan^ ^atm = !)• In 
particular from the figure we note that for energies larger than E, ~ few TcV and smaller 
than E, ~ 0.1 GeV there is essentially no information on the atmospheric neutrino flux 
coming from the available event rates. 
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Figure 1: Sensitivity to the neutrino energy of the different types of neutrino event rates (see 
Eq. (2.4)). In this figure we assume oscillations with Am^^j^^ — 2.2 x 10~^ eV'^ and tan^ ^atm — 1 
and the Honda atmospheric fluxes [6]. 



The experimental correlation matrix for the event rates Ri"^^^^ is constructed in the 



following way: 

stat,2 e I \ ^ cor,n cor,n 



2 [■ , \ ^ cor,n cc 



Pi'j - ^tot^tot ' l-^-^i 



where the statistical uncertainty is given by 



af-^ = Vijf , (2.6) 

and the A'^cor correlated uncertainties are computed from the couplings factors, vr" to the 
corresponding pull [29] , 

^cor,n _ ^(-P)^n ^ (2.7) 

The couplings Trf' used in the analysis are a generalization of those given in Ref. [30] to 
include the separation between sub-GeV and mid-GeV and the partially contained samples, 
and will be described in more detail in a forthcoming publication. 
In particular, we consider three different sets of correlated errors: 

1. Experimental systematic uncertainties {exp), like uncertainty in the detector calibra- 
tion and efficiency. 

2. Theoretical cross-section uncertainties (cross), like cross-section normalization errors 
and cross section ratios errors. 
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[X Stop 


0.16 
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0.10 0.19 0.41 


/X Thru 


0.08 


0.02 0.08 0.23 
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Tot al 


0.10 


0.03 0.11 0.03 


0.11 0.15 0.35 


0.12 0.1(3 0.36 



Table 1: Features of experimental data atmospheric neutrino event rates from Super- 
Kamiokande [3]. Experimental errors are given as percentages. 



3. Theoretical zenith angle and flavor flux uncertainty {fix). 

Note that in standard neutrino oscillation parameter global fits like [3] , the flux uncertain- 
ties include on top of the above a flux normalization and a flux tilt errors, but in our case 
we do not have to include them because wc arc determining the energy dependence of the 
atmospheric flux (including its normalization) directly from the neutrino data. In other 
words, since our objective is to establish how well we can determine the normalization 
and energy dependence of the flux directly from the data, nothing is assumed about them: 
neither their values nor their theoretical uncertainties. We will go back to this point after 
Eq. (2.12). 

Finally the total error is computed adding the statistical and correlated errors in 
quadrature 



^tot 
(Ti 



stat,2 1 / cor,n\2 o\ 

+Z^i^i )- (2-8) 



n=l 

We summarize the characteristic values of the uncertainties in the different data samples 
in Table 1. From the table we see that the experimental statistical errors are more impor- 
tant in the high energy samples while the correlations are dominated by the cross section 
uncertainties. 

The purpose of the artificial data generation is to produce a Monte Carlo set of 'pseudo- 
data', i.e. Nj-ep replicas of the original set of N^s,t data points: 

^{axt)W. k = l,...,N,,p, i = l,...,iVdat, (2.9) 

such that the -A^rep sets of -/Vdat points are distributed according to an -/Vdat-dimensional 
multi-gaussian distribution around the original points, with expectation values equal to 
the central experimental values, and error and covariance equal to the corresponding ex- 
perimental quantities. 
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This is achieved by defining 

^(art)W^^(exp)^^(.)^tot^ i = 1, . . . , TVdat , k = 1, . . . , N,,^ , (2.10) 

where A'rep is the number of generated repHcas of the experimental data, and where rf'^ 
are univariate gaussian random numbers with the same correlation matrix as experimental 
data, that is they satisfy 

(.fVf)^^^^.«r'.o(^). (2.n) 

Because [31] the distribution of the experimental data coincides (for a flat prior) with the 
probability distribution of the value of the event rate Ri at the points where it has been 
measured, this Monte Carlo set gives a sampling of the probability measure at those points. 

Note that all errors considered (including correlated systematics and theoretical uncer- 
tainties) must be treated as gaussian in this framework. However, this does not imply that 
the resulting flux probability density is gaussian, since the Monte Carlo method allows for a 
non-gaussian distribution of best-fit atmospheric neutrino flux distributions to be obtained 
as a result of the fitting procedure. 

We can then generate arbitrarily many sets of pseudo-data, and choose the number of 
sets A'^rcp in such a way that the properties of the Monte Carlo sample reproduce those of 
the original data set to arbitrary accuracy. The relevant issue at this point is to determine 
the minimum number of replicas required to reproduce the properties of the original data 
set with enough accuracy. 

In our particular case we have generated two different samples of replicas, each with one 
different set of correlated uncertainties, as described above, exp+cross, and exp+cross+flx. 
A fit including the exp uncertainties only is meaningless since the results of the Super- 
Kamiokande experiment are not consistent with the neutrino oscillation hypothesis (since 
the overall Xgj^ is too high) and with the results of K2K and MINOS (since the pre- 
ferred Amgj^ is too low) if both flux and cross-section uncertainties are simultaneously 
neglected [35]. The features of the Monte Carlo generated replicas with these two types 
of uncertainties included in the generation are summarized in terms of several statistical 
estimators (see Appendix A for definitions) in Table 2. 

These statistical estimators allow us to assess in a quantitative way if the Monte Carlo 
sample of replicas reproduces the features of the experimental data. For example, we can 
check that averages, variance and covariance of the pseudo-data reproduce central values 
and covariance matrix elements of the original data. Prom the table we see that for the 
two sets of uncertainties considered, 100 replicas are enough to reproduce the properties 
of the original data set with the required accuracy, namely a few percent, which is the 
accuracy of the experimental data while it is clear that 10 replicas fall short to reproduce 
the statistical properties of the experimental data. 

2.2 Neural Network Training 

The second step consists of training A'^rcp neural networks. In our case each neural network 
parametrizes a differential flux, which in principle should depend on the neutrino energy Ei,, 
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137.6 145.7 145.7 

122.7 129.2 53.2 
0.98 0.98 0.42 



Table 2: Comparison between experimental data and Monte Carlo data for different types of 
uncertainties and different numbers of A^rep included in the pseudo-data generation. 



the zenith angle eos (^//) = a,nd the neutrino type t {t — 1, ... ,4 labels the the neutrino 
flavor: electron neutrinos and antineutrinos, and muon neutrinos and antineutrinos), and 
is based on all the data in one single replica of the original data set. However, the precision 
of the available experimental data is not enough to allow for a separate determination of 
the energy, zenith angle and type dependence of the atmospheric flux. Consequently in 
this work we will assume the zenith and type dependence of the flux to be known with 
some precision and extract from the data only its energy dependence. Thus the neural flux 
parametrization will be: 

where NN {Ei,) is the neural network output when the input is the neutrino energy E^ 

NN {Ey) = NN {Ey,Cj) . (2.13) 

and it depends on the neutrino energy^ as well as on the parameters Co of the neural net- 
work (see Appendix C for details). In Eq. (2.12) <I>('"°^) is a reference difl'erential flux, which 
we take to be the most recent computations of either the Honda [6] or the Bartol [7] col- 
laborations, extended to cover also the high-energy region by consistent matching with the 
Volkova fluxes [34] . Notice that in what respects the normalization and energy dependence 
of the fluxes, the choice of reference flux is irrelevant. Any variation on the normalization 
or on the energy dependence of the reference flux can be compensated by the corresponding 
variation of NN (Ei,) so that the output flux $(°'^*) will be the same. The dependence of 



^In fact the input variable is rather logj^,, E^, since it can be shown [33] that an appropriate preprocessing 
of the input parameters speeds up the training process. 
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the results of the analysis on the reference flux comes because of the differences among the 
different flux calculations in angular and flavour dependence. 

Nothing further is assumed about the function NN (E^) whose value is only known 
after the full procedure of training of the neural net is finished. There are, however, some 
requirements about the choice of the architecture of the neural network. As discussed in 
Refs. [20, 21] such choice cannot be derived from general rules and it must be tailored to 
each specific problem. The main requirements for an optimal architecture are first of all 
that the net is large enough so that the results are stable with respect small variations of 
the number of neurons (in this case the neural net is called redundant) and, second, that 
this net is not so large than the training times become prohibitive. For our problem the 
neural network must have a single input neuron (whose value is \og{Ei,)) and a final output 
neuron (whose value is the NN {Ey)) and a number of hidden layers with several neurons 
each (see appendix C for further details). We have checked that an architecture with two 
hidden layers with 5 neurons each, i.e. a 1-5-5-1 network, satisfies the above requirements 
in the present case. 

The process which determines the function NN [En) which better describes each of the 
k = 1, . . . ,Nrep sets of artificial data, {i?!^*^^'^^}, is what we call training of the neural 
network. It involves two substeps. First for a given NN (Ei,) the expected atmospheric 
event rates have to be computed in a fast an efficient way. Second the neural network 
parameters have to be determined by minimizing some error function. We describe them 
next. 



2.2.1 From Atmospheric Fluxes to Event Rates 

The expected event rate for contained and upgoing muon events for a given set of neural 
network parameters Co, or what is the same for a given value of the neural network flux, 
can be obtained by substituting Eq. (2.12) into Eqs. (2.2) and (2.3) respectively. However, 
from a practical point of view the above expressions are very time-consuming to evaluate, 
and this is a very serious problem in our case since in the neural network approach one 
requires a very large number of evaluations in the training process. 

The procedure can be speed up if one realizes that for a given flux $(°^*) the expected 
event rates can always be written as 

^(net) = ^ r dc, r dE, ^^-^^t) {E,, C„ t) Ci{E,, c,,t), (2.14) 

^ J — 1 J ^Jmin 

where t = {a,zb} labels both the flavor and the chirality of the initial neutrino state. 
Comparing Eq. (2.14) with Eqs. (2.2) and (2.3) we get: 

Ci{Ei„c„,t) =ntgtT'S2 / dh dEi / dca / dtpa 

p Jo JE^in J-i Jo 

Kt{Ey, c^, h) Pt-.p{Ey, c^, h I ff) fj"^ {Ey, Ei,ca) ef''{Ei,ci{cy, Ca, <^„)) (2.15) 
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for contained events, and 



Ci{E^,C^,t)=ProckT dh dEl 

Jo J E^-.„ J E, 



^1 



dE- 



'fin 



-1 



(Pa,, 



1 i-2tt 
dCa / difa 

Jo 





Kt{Ei„ h) Pt^^{Ey, Cy, h I ??) ^^0^^ (-^i^' ^^M' 



(2.16) 



for upgoing-muon events. Consequently, if one discretizes the differential fluxes in iVg 
energy intervals /g S'ld Nz zenith angle intervals Iz one can write 



e,z 

^= 1, 



,4, 



(2.17) 
(2.18) 



and therefore it is possible to write the theoretical predictions as a sum of the elements of 
the discretized flux table. 



(net) 



(net) 
ezt 1 



(2.19) 



ezt 



^(net) ^ f 
ezt — / 
J- 



where the coefHcients Cg^t, which are the most time-consuming ingredient, need only to be 
precomputed once before the training, since they do not depend on the parametrization of 
the atmospheric neutrino flux. 

In our calculations we have used iVg = 100 energy intervals, equally spaced between 
logio E,y = —1 and logio Ey = b (with E^ in GeV) and Nz = 80 bins in zenith angle, equally 
spaced between = — 1 and Cv = 1 so 

l+6e/Afe n-l+z/N^ 

dlog^oE. / dc, $(°^*)(£;„c„0, (2.20) 

-l+6{e-l)/Ne J-l+{z~l)/N^ 

where the integrations in Eq. (2.20) are performed via Monte Carlo numerical integration. 
2.2.2 Minimization Procedure 

Prom each replica of artificial data ^R^^'^^^'^^y an atmospheric neutrino flux parametrized 
with a neural network is obtained. The iVdat data points in each replica are used to 

determine the parameters of the associated neural net. The fit of the fluxes to each replica 
of the data, or what is the same, the determination of the parameters that define the neural 
network, its weights, is performed by maximum likelihood. This procedure, the so-called 
neural network training, proceeds by minimizing an error function E^''\ which coincides 
with the of the experimental points when compared to their theoretical determination 
obtained using the given set of fluxes: 

2 



eC"^ (io) 
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A^dat 
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^(net)(k) ^-.^ 


1 + E^?^« 


n(art)(fc) 
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-stat 



(2.21) 
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The case k = corresponds to the experimental values, = Ry^^'. The E^^^ has to 

be minimized with respect to u), the parameters of the neural network. 

We perform two different type of fits which we denote by exp+cross and exp+cross+flx. 
To be consistent we include in each one the same correlated uncertainties that have been 
included in the replica generation. For example, if we want to include only the effects of 
the exp+cross uncertainties, in Eqs. (2.5) and (2.8) the sum includes only experimental 
systematic and theoretical cross section uncertainties, while in Eq. (2.21) one imposes 
= for fix uncertainties. 

Unlike in conventional fits with errors, however, the covariance matrices of the best- 
fit parameters are irrelevant and need not be computed. The uncertainty on the final 
result is found from the variance of the Monte Carlo sample. This eliminates the problem 
of choosing the value of Ax^ which corresponds to a one-sigma contour in the space of 
parameters. 

Rather, one only has to make sure that each neural net provides a consistent fit to 
its corresponding replica. If the underlying data are incompatible or have underestimated 
errors, the best fit might be worse than one would expect with properly estimated gaussian 
errors — for instance in the presence of underestimated errors it will have typically a value 
of per degree of freedom larger than one. However, neural nets are ideally suited for 
providing a fit in this situation, based on the reasonable assumption of smoothness: for 
example, incompatible data or data with underestimated errors will naturally be fitted less 
accurately by the neural net. Also, this allows for non-gaussian behavior of experimental 
uncertainties. 

The minimization of Eq. (2.21) is performed with the use of genetic algorithms (summa- 
rized in Appendix D, see Ref. [21] and references therein for a more complete description). 
Because of the nonlinear dependence of the neural net on its parameters, and the nonlocal 
dependence of the measured quantities on the neural net (event rates are given by multi- 
dimensional convolutions of the initial flux distributions) , a genetic algorithm turns out to 
be the most efficient minimization method. The use of a genetic algorithm is particularly 
convenient when seeking a minimum in a very wide space with potentially many local min- 
ima, because the method handles a population of solutions rather than traversing a path 
in the space of solutions. 

The minimization is ended after a number of iterations of the minimization algorithm 
large enough so that E^'^'> of Eq. (2.21) stops decreasing, that is, when the fit has converged.^ 
Thus an important issue in the procedure is to determine the right number of iterations 
which should be used. In order to determine them, we define the total x^ot) 




rep 



(exp) ^ 




mmr- 



E 



.Stat 



n 



(2.22) 



n 



V 



*Note that the standard criterion to stop neural network training, the overlearning criterion, cannot be 
used in our case due to the scarce amount of data. 
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Figure 2: Dependence of the Xtot with the number of iterations in the minimization procedure, 
for fits with different sets of uncertainties incorporated. 

with the event rates computed as an average over the sample of trained neural nets, 



and we study its value as a function of the number of iterations used in the minimization. 

We show in Fig. 2 the dependence of Xtot the number of minimization iterations, 
in the two cases considered. From the figure we see that the number of iterations needed 
to achieve convergence can be safely taken to be Nu = 150 both for the in the exp+cross 
and exp+cross+flx fits. 

Thus at the end of the procedure, we end up with Nj-^p fluxes, with each flux <I)('^'^*)('^) 
given by a neural net. The set of Nj-^p fluxes provide our best representation of the corre- 
sponding probability density in the space of atmospheric neutrino fluxes: for example, the 
mean value of the flux at a given value of E,^ is found by averaging over the replicas, and 
the uncertainty on this value is the variance of the values given by the replicas. Generally, 
we expect the uncertainty on our final result to be somewhat smaller than the uncertainty 
on the input data, because the information contained in several data points is combined. 

There are two type of tests that can be performed on the properties of this probability 
measure. First, the self-consistency of the Monte Carlo sample can be tested in order to 
ascertain that it leads to consistent estimates of the uncertainty on the final set of fluxes, 
for example by verifying that the value of the flux extracted from different replicas indeed 




(2.23) 
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behaves as a random variable with the stated variance. This set of tests aUows us to 
make sure that the Monte Carlo sample of neural nets provides a faithful and consistent 
representation of the information contained in the data on the probability measure in the 
space of fluxes, and in particular that the value of fluxes and their (correlated) uncertainties 
are correctly estimated. 

Furthermore the properties of this measure can be tested against the input data by 
using it to compute means, variance and covariances which can be compared to the input 
experimental ones which have been used in the flux determination. 



3. Results for the Reference Fit 



In this section we discuss our results for the reference fit of the atmospheric neutrino fluxes. 
For this fit we use the Honda [6] flux as reference, the exp+cross+flx set of uncertainties, 
and we assume i/^ — oscillations with oscillation parameters Aml^^ = 2.2 x 10~^ and 
tan^^atm = 1- We will discuss in next section the dependence of the results on these 
choices. 

As described in the previous section, we start by generating a set of A^rep = 100 repli- 
cas of the experimental data points according to Eq. (2.10) where in 0"*°* we include both 
the statistical as well as the correlated errors from experimental systematic uncertainties, 
theoretical cross section uncertainties and the theoretical flux uncertainties in the angular 
distributions and neutrino type ratios. As shown in Table 2, this Monte Carlo sample of 
replicas reproduces with enough precision the statistical features of the original experimen- 
tal data. 

After that we proceed to the training of the neural networks as described in Sec. 2.2. 
Once the training of the sample of neural networks has been completed, we obtain the 
set of A^rep fluxes which provide us with the probability density in the space of 

atmospheric neutrino fluxes. In particular we compute the average atmospheric neutrino 
flux as 



rep Nj-ep 



^5]Ariv('=)(£:,) 
'■''P fc=i 



(3.1) 



and the standard deviation as 



aUE,,c,,t) = ^J2 (^^°'*^^'^(^.,c„t)) - {E,,c,,t) . (3.2) 

-iVrep ^ / \ / rep 

for any given value of the energy Ei^, the zenith angle Cjy and the neutrino type t. 

In Fig. 3 we show the results for the flux (in particular we show the angular averaged 
muon neutrino flux) as compared with the computations of the Honda [6] and Bartol [7] 
groups. The results of the neural network fit are shown as the (^'^°°*'')j.ep ='= "^^ band as a 
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Figure 3: Results for the reference fit for the angular averaged muon neutrino flux and comparison 
with numerical computations. 



function of the neutrino energy. We see from the figure that the flux obtained from this 
fits is in reasonable agreement with the results from the the calculations of Honda and 
Bartol groups. We also see that at lower energies the present uncertainty in the extracted 
fluxes is larger than the range of variations between calculations while at higher energies 
the opposite holds. The fit also seems to prefer a slightly higher flux at higher energies. 

The statistical estimators for this reference training are given in the first column in 
Table 3, where the different estimators can be found in Appendix A. Note that errors are 
somewhat reduced, as expected if the neural network has found the underlying physical law, 
correlations increase and covariances are appropriately reproduced, a sign that the sample 
of trained neural networks correctly reproduces the probability measure that underlies 
experimental data. We will return to the discussion on error reduction in next section. 

In Fig. 4, we plot the relative error (T$/<I> as function of the energy which, as seen 
in the figure, grows at the lowest and highest energies. The fact that the relative error 
grows in the region where less data is available reflects the fact that the behavior of neural 
networks in those regions is not determined by its behavior in the regions where more data 
is available, as it would happen in fits with usual functional forms. 
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0.78 



Table 3: Comparison between experimental data and the results of the neural network training 
for the reference fit (Honda fluxes as reference with exp+cross+flx errors), for the fit with Bartol 
fluxes as reference with exp+cross+flx errors and for Honda reference fluxes with with exp+cross 
errors only. In all cases A^icp = 100 replicas are used. 
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Figure 4: Relative error in the determination of the flux. 



The first and second derivatives of the flux ratio with the associated uncertainties, 



^^^^ = dnnE^<^M)(^E,y (3-4) 
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Figure 5: First and second derivative of the atmospheric neutrino flux ratio, Eqs. (3.3) and (3.4). 

are shown in Fig. 5. Prom the figure we see that, within the present errors, the first deriva- 
tive is not a constant, or in other words one cannot parametrize the energy dependence of 
the flux uncertainty as a simple tilt correction. 



However, we also see that the second derivative is compatible with zero within errors in 
almost all the energy range, which implies that the uncertainty is not a much more strongly 
varying function of the energy. 

Finally from Eq. (2.19) we compute the predicted event rates for the A^rop fluxes 
and define their average as in Eq. (2.23) and their standard deviation as 



In Fig. 6 we show a comparison of the experimental data with the corresponding predictions 
of the neural network parametrization of the atmospheric neutrino flux. From the figure 
we see that the predicted rates are in good agreement with the data, but, as expected, 
have a smoother zenith angular dependence. 

Another interesting figure of merit to verify the correct statistical behavior of the 
neural network training is the distribution of Xdat ' defined as the x of set of neural 
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Figure 6: Predicted number of atmospheric neutrino events using the atmospheric fluxes resulting 
from the reference fit compared to the experimental data points. The central values correspond to 
the average prediction Eq. (2.23) and the error bars give the Icr ranges Eq. (3.6). Notice that only 
the statistical error is shown for the experimental data points. 
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Figure 7: Distribution of x'dat y ^'i- (3-7), in the reference case. 
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This distribution is shown in Fig. 7. Note that for all the fluxes Xdat — xtot ■ 



if we define Ax^ 

2(fe) 



2{k) 
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Xtot' 



where Xmal is the maximum value of the set of 68% fluxes 
with lower Xdat^^® that in the present analysis the 1-a range of fluxes obtained from 



Furthermore, 



the sample of trained neural networks corresponds to Ax^ 
\/ 2 A'dat ~ 13 as expected for a consistent distribution of fits. 



5, which satisfies Ax < 



4. Stability of the Results 

In this section we discuss the stability of our results with respect to different inputs used 
for the reference fit, such as some of the choices in the training procedure, the assump- 
tions made on the zenith angle and flavor dependence of the atmospheric fluxes, and the 
uncertainty on the neutrino oscillation parameters. 

4.1 Impact of Training Choices 

In order to verify the stability of the results with respect to the minimization algorithm 
used in the neural network training procedure first of all we have repeated the fit using 
genetic algorithms but with a larger number of iterations (A^it=300). The results are 
shown in Fig. 8. We see that this increase in the number of iterations does not lead to any 
substantial variation of the allowed range of fluxes. This implies that the minimization had 
indeed converged well before as it was illustrated in Fig. 2. 

Second, we have repeated the fit by using dynamical stopping of the training as alter- 
native minimization strategy. In dynamical stopping, instead of training each neural net a 
fixed number of iterations, the training of each net is stopped independently when a certain 
condition on the error function E^''^ is satisfied. In particular, in this training strategy one 
fixes a parameter Xstop ™d then stops the fit to each replica separately when the condition 
j^{k) ^ ;)(;2^Qp is satisfied, with £'(^) as stated in Eq. (2.21). It can be shown [20] that the 
typical values for E^'^^ are of the order of x^^^^ + -^dat- Therefore each different value of 
will result in a different value of the total Xtot; -^1- (2.22). Clearly one expects that 
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Figure 8: Dependence of the allowed ranges of fluxes on different choices in the reference training. 
The full region is the result of our reference training. The dotted lines are the range of extracted 
fluxes if using dynamical stopping of the minimization with Xg^op — 160. The dashed lines are 
the range of extracted fluxes if using genetic algorithms but with TVit = 300 iterations in the 
minimization. The dot-dashed lines are the range of extracted fluxes when removing the thrugoing 
muon sample from the fit. 



the higher the value of Xtot larger the flux error ranges. Thus to make the comparison 
meaningful we must chose a value of Xstop which leads to a Xtot of comparable value of the 
one obtained in the reference fit. We have verified that for Xstop ~ 

160, 

Xtot — 76.5 which 

is close enough to the reference fit value Xtot — 74.6. The results for this alternative fit are 
shown in Fig. 8. Again we see that the results obtained with both minimization strategies 
are very compatible. The main effect of using dynamical stopping is a slight increase of 
the allowed range of fluxes in the intermediate energy region. 

Finally, as a consistency check, we have verified that the obtained energy dependence 
in a given energy range is mostly determined by the data sample which is most sensitive 
to that energy range. We have done so by repeating the reference fit but removing the 
thrugoing muon sample in the analysis. The results are shown in Fig. 8 where we see 
that, while the fit is unaltered at lower energies, the results get considerably different for 
neutrinos with energies Ey > 50 GeV which are the ones responsible for thrugoing neutrino 
events as illustrated in Fig. 1. Basically once the thrugoing muon sample is removed from 
the fit, we have no experimental information for Ey > 100 GeV. Consequently the results 
of the fit at those higher energies are just an unphysical extrapolation of the fit at lower 
energies as clearly illustrated in the figure where we see that there is no lower bound to 
the allowed flux at those higher energies. 

This reflects the fact that the behavior of neural networks in the extrapolation region 
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is not determined by its behavior where more data is available, as it would happen in fits 
with usual functional forms. In principle, if the neural network had found some underlying 
physical law which described the experimental data and which would be valid both in the 
region where data is available and in the extrapolation region, the value of the extrapolated 
fluxes could be less unconstrained. However, this is not expected in this case since it is 
precisely at energies of the order of E^, ~ 100 GeV that the tt start interacting before they 
are able to decay which implies that the underlying physical law is different below and 
above those energies, and the extrapolation is therefore very much unconstrained. 

4.2 Impact of Choice for Zenith Angle and Flavor Flux Dependence 

As previously discussed, since the available experimental data from Supcr-Kamiokande 
are not precise enough to allow for a simultaneous independent determination of all the 
elements in the atmospheric neutrino fluxes, we have restricted our analysis to the deter- 
mination of their energy dependence, and taken the zenith angle and flavor dependence 
from a previous computation [6] . 

In order to assess the effects of this choice we first repeat the reference fit but using as 
reference flux the Bartol flux [7] . In the left panel of Fig. 9 we compare the results of these 
two fits. We sec that the results are identical for Ei, < 10 GeV as expected since both 
Honda and Bartol calculations give very similar angular and flavour ratios at those energies. 
And for any energy the difference between the results of both fits are much smaller than 
the differences between the reference fluxes themselves, see Fig. 3. This is what is expected 
since the fix uncertainties included in both fits represent the spread on the theoretical flux 
calculations of the angular dependence and flavour ratios. 

In the second column in Table 3 we give the statistical estimators corresponding to the 
fit taking Bartol as a reference fiux. As expected the differences with results using Honda 
as reference flux (first column in Table 3) are very small. 

The importance of the choice of angular and flavour dependence can also be addressed 
by studying the effect of removing from the analysis the fix errors since those errors ac- 
count for the spread of the predicted angular and flavour dependence among the different 
atmospheric flux calculations. In other words removing those errors we force the neural net 
flux to follow the angular and flavour dependence of the reference flux without allowing for 
any fluctuation about them. The results of this flt and its comparison with the reference flt 
are shown in the right panel of Fig. 9. The corresponding statistical estimators are given 
in the third column of Table 3 while in Fig. 10 we show the neural network predictions for 
the event rates in this case. 

As expected, Xtot (^^^ Table 3) is larger once the fix errors are not included in the fit, 
although given the small size of the fix errors this increase is very moderate (only 1.1 units 
for 90 data points). Equivalently from Table 3 we see the small impact that the exclusion 
of the fix errors makes in the evaluation of the statistical estimators of the experimental 
data ((T(®^P')(jat ) {p^^^'^^)da.t ) and (cov('^^P))dat which is again a reflection of the small values 
of the fix compared to the experimental statistical and systematic uncertainties. 

There is, however, a much more important effect in the size of the allowed range of 
fluxes and predicted rates in as seen in Figs. 9 and 10. As shown in Fig. 4 the relative 
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Figure 9: left Comparison of results with different reference fluxes, right Comparison of deter- 
mined fluxes in flts which include different sets of errors 



error of the flux is reduced by a factor ~ 3-4 in the full energy range. At first this 
considerable reduction of the relative flux error when removing only the relatively small fix 
uncertainties may seem counterintuitive. However this result is expected if the results of 
the fit are consistent. This is because the zenith and flavor dependence is not fitted from 
data. As a consequence, if no uncertainties are included associated with those, there are 
Ne = 20 or 10 binned rates of similar statistical weight (20 from the e-like and mu-like 
distributions for sub- mid- and multi-GeV events and 10 for partially contained, stopping 
and thrugoing muons) contributing to the determination of the flux in the same energy 
range with no allowed fluctuations among them. Thus the associated uncertainty in the 
determination of the energy dependence is indeed reduced by a factor \^Ne ~ 3-5. This is 
also reflected in Table 3, where we see that there is a reduction on the statistical estimators 
which measure the average spread of the predicted rates obtained when using the neural 
net fluxes, ((T('^'^t))dat and (cov('^'=t))daf^ 

The inclusion of the fix uncertainties and their corresponding pulls allows for the 
angular and flavour ratio of the fitted fluxes to spread around their reference flux values. 
This results into an effective decoupling of the contribution of the Ne data points to the fit 
at a given E with the corresponding increase in the flux relative error. Furthermore once 
the fix uncertainties are included the range of the angular binned rate predictions are of 
the same order of the statistical error of the experimental points (see Fig. 1). 

Finally let's also point out that, in general, adding or removing some source of uncer- 

^Since (cov'"'^*-')dat is proportional to the square of the u'"''*', the reduction in this case is a factor 
16- (^/iVB)^ 



- 22 - 



^ Neural Network exp+cross i Experimental Data 

250 1 — ^ — ^ — ^ — ^ — \ — ^ — ^ — ^ — ^ — I 200 

180 



200 



150 



_ sub— GeV e— like 
-1 



160 
140 
120 



00 



75 



50 



IT 



180 

160 

140 

120 

100 

80 
200 

150 

100 

50 



: sub-GeV /i-like 



100 

1 -1 

300 



250 



200 



: mid— GeV e-like 





150 



100 



2 25 
1 -1 



100 



: multi-GeV e-like : 

1 



rrnid-GeV /^-like 



50 



e 

_ multi-GeV /x-like 



1 -1 
100 



5 



: Part Cont us 



80 
60 
40 
20 




: Stopping /xs 



1-1 

CosiS- 




Figure 10: Predicted number of atmospheric neutrino events using the atmospheric fluxes resuhing 
from the fit with exp+cross errors only compared to the experimental data points. The central values 
correspond to the average prediction and the error bars give the la ranges. Notice that only the 
statistical error is shown for the experimental data points. 



tainty does not only change the size of the associated errors in the parametrization but 
also the position of the minimum, that is, the features of the best-fit flux, although this 
effect is small in the present case. 

4.3 Impact of Oscillation Parameters 

So far the results presented for the different fits have been done assuming — > i/j- oscilla- 
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Figure 11: Comparison of the reference fit with the envelope of fits obtained varying the atmo- 
spheric neutrino osciUation parameters given in the label. 

tions with oscillation parameters fixed to 

sm^29,tm = l, Aml^ = 2.2 x lO'^ eV^ . (4.1) 

In order to address the impact on the results of the assumed value of the oscillation pa- 
rameters we repeat the fit with these parameters varying within their 1-sigma ranges as 
allowed by global fits to neutrino oscillations data. In particular we consider the range 

sin2 26latm > 0.96, , 1.8 x 10"^ eV^ < Aml^^ < 2.7 x 10"^ eV^ . (4.2) 

The results are shown in Fig. 11 where we show the envelope of the results obtained 
varying each time one of the neutrino oscillation parameters. As we can see in Fig. 11, the 
contribution to the total error from the uncertainty in the neutrino oscillation parameters 
is rather small. Therefore we can be confident than the impact in our results of the 
uncertainties in the oscillation parameters is very small, and moreover this uncertainty can 
be systematically reduced as our knowledge of neutrino oscillation parameters increases. 

5. Summary and Outlook 

In this work we have presented the first results on the determination of the energy de- 
pendence of the atmospheric neutrino fluxes from the data on atmospheric neutrino event 
rates measured by the Super-Kamiokande experiment. In order to bypass the problem of 
the unknown functional form for the neutrino fluxes we have made use of artificial neural 
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Figure 12: Results for the reference fit for the angular averaged muon neutrino plus anti neutrino 
flux extrapolated to the high energy region compared to the corresponding data from AMANDA [37] . 



networks as unbiased interpolants. On top of this, a faithful estimation of the uncertainties 
of the neutrino flux parametrization has been obtained by the use of Monte Carlo methods. 

In our analysis we have relied on the zenith and flavour dependence of the flux as 
predicted by some of the atmospheric flux calculations in Refs. [6-8]. Also, the fluxes 
are determined under the assumption that oscillation parameters will eventually be inde- 
pendently determined by non atmospheric neutrino experiments with a value close to the 
present best fit. We have estimated the uncertainties associate with these choices by per- 
forming alternative fits to the data where some of these assumptions were changed and/or 
relaxed. 

Our main result is presented in Fig. 3. We have found that until about 1 TeV we 

have a good understanding of the normalization of the fluxes and the present accuracy from 
Super-Kamiokande neutrino data is comparable with the theoretical uncertainties from the 
numerical calculations. The results of our alternative fits shows that if one assumes that 
the present uncertainties of the angular dependence have been properly estimated, it turns 
out that the assumed angular dependence has very little effect on the determination of the 
energy dependence of the fluxes. Thus the determined atmospheric neutrino fluxes could 
be used as an alternative of the existing flux calculations, and are available upon request 
to the authors. 

The results of this work can be extended in several directions. It would be interesting 
to include in the analysis the atmospheric neutrino data from detectors that probe the 
high energy region, hke AMANDA [36,37] or ICECUBE [38]. To illustrate the reach of 
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the presently available statistics at those energies we show in Fig. 12 the results for the 
reference fit for the angular averaged muon neutrino plus antineutrino flux extrapolated 
to the high energy region compared to the data from AMANDA [36,37]. Notice that, 
as mentioned above, the behavior of neural networks in the extrapolation region is not 
determined by its behavior where data is available, as it would happen in fits with usual 
functional forms. As a consequence the values of the extracted fluxes in the extrapolation 
region can be extremely unphysical as described in Sec. 4.1. To improve the extrapolation, 
one could use high-energy functional forms for the atmospheric neutrino flux, for example 
those presented in [39], which have been used in [40] to fit analytical expressions for the 
fluxes to the Monte Carlo simulations. The implementation of this strategy is postponed 
to future work, when also the AMANDA and ICECUBE atmospheric neutrino data will 
be incorporated in the fit. 

Furthermore one could assess the effects of determining from experimental data the 
full energy, zenith and flavor dependence of the atmospheric neutrino fluxes together with 
the oscillation parameters, in particular in the context of the expected data from future 
megaton neutrino detectors [15,16]. 

Acknowledgments 

We thank F. Halzen for comments and careful reading of the manuscript. J. R. would 
like to acknowledge the members of the NNPDF Collaboration: Luigi del Debbio, Stefano 
Forte, Jose Ignacio Latorre and Andrea Piccione, since a sizable part of this work is related 
to an upcoming common publication [24]. M.C. G-G would hke to thank the CERN 
theory division for their hospitality during the weeks previous to the finalization of this 
work. This work is supported by National Science Foundation grant PHY-0354776 and by 
Spanish Grants FPA-2004-00996 and AP2002-2415. 

A. Statistical Estimators 

In this Appendix we summarize the estimators used to validate the generation of the 
Monte Carlo sample of replicas of the experimental data. The corresponding estimators 
that validate the neural network training can be straightforwardly obtained by replacing 
the superscript with (^^^) . 

• Average over the number of replicas for each experimental point i 




(A.l) 



• Associated variance 




(A.2) 
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• Associated covariance 
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(A.4) 



The three above quantities provide the estimators of the experimental central values, 
errors and correlations which one extracts from the sample of experimental data. 



• Mean variance and percentage error on central values over the A^^at data points. 
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for er- 



ations and covariances respectively. 

These estimators indicate how close the averages over generated data are to the 
experimental values. Note that in averages over correlations and covariances one has 
to use the fact that correlation and covariances matrices are positive definite, and 
thus one has to be careful to avoid double counting. For example, the percentage 
error on the correlation will be defined as 
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and similarly for averages over elements of the covariance matrix. 
• Scatter correlation: 

^^(exp) ^ _ (i?(<^^P)>^^^ ((i?(^rt)^ 
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where the scatter variances are defined as 
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We define analogously r [cj'^^''*)] , r [p'-^''*^] and r [cov^^""*)] . Note that the scatter corre- 
lation and scatter variance are not related to the variance and correlation Eqs. (A.2)- 
(A.4). The scatter correlation indicates the size of the spread of data around a straight 
line. Specifically r [cr'^'^^*)] = 1 implies that (cr^"''*'' ) is proportional to a\^^'^\ 



• Average variance: 

1=1 

We define analogously {p^^^^)^^^ and (cov'^^''*^)^^^, as well as the corresponding ex- 
perimental quantities. These quantities are interesting because even if r are close to 
1 there could still be a systematic bias in the estimators Eqs. (A.2)-(A.4). This is 
so since even if all scatter correlations are very close to 1, it could be that some of 
the Eqs. (A.2)-(A.4) where sizably smaller than its experimental counterparts, even 
if being proportional to them. 

The typical scaling of the various quantities with the number of generated replicas iV^cp 
follows the standard behavior of gaussian Monte Carlo samples. For instance, variances on 
central values scale as l/Nj-^p, while variances on errors scale as 1/ -sjN^^^. Also, because 



y[p(art)j 



iVrep 



(A.12) 



the estimated correlation fiuctuates more for small values of p^^^\ and thus the average 
correlation tends to be larger than the corresponding experimental value. 



B. The Monte Carlo Approach to Error Estimation 

In this Appendix we show with a simple example how the Monte Carlo approach to error 
estimation is equivalent to the standard approach, based on the condition Ax^ = 1 for the 
determination of confidence levels, with the assumption of gaussian errors, up to linearized 
approximations. For a more detailed analysis of this statistical technique the reader is 
referred to [31]. 

Let us consider two pairs of independent measurements of the same quantity, xi ± a\ 
and X2 ± (T2 with gaussian uncertainties. The distribution of true values of the variable x 
is a gaussian distribution centered at 

and with variance determined by the Ax^ = 1 tolerance criterion, 

'1+02 



a^ = -^. (B.2) 
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To obtain the above results, note that if errors are gaussianly distributed, the maximum 
hkehhood condition imphes that the mean x minimizes the function 

2 _ {Xl - X) {X2 - x) 

X - 2 1 2 ' V^-^^l 

(Ti (72 

and the variance a is determined by the condition 

^X^ = {x + a) - {x) , (B.4) 

which for Ax^ = 1 leads to Eq. (B.2). Note that these properties only hold for gaussian 
measurements. 

An alternative way to compute the mean and the variance of the combined measure- 
ments Xl and X2 is the Monte Carlo method: generate A'rep replicas of the pair of values 
xi,X2 gaussianly distributed with the appropriate error, 

xf^ = Xl + rf ^(71, = 1, . . . , TVrep , (B.5) 

xf^ =X2 + rf ^2, A; = 1, . . . , TVrep , (B.6) 

where A^'^ are univariate gaussian random numbers. One can then show that for each pair, 
the weighted average 



0^2 "T Xl 

al + al 

is gaussianly distributed with central value and width equal to the one determined in the 
previous case. That is, it can be show that for a large enough value of iVj-ep, 

iVrcp 

\ / rep TVrep 

and for the variance 

which is the same result, Eq. (B.2), as obtained from the A^;^ = 1 criterion. This shows 
that the two procedures are equivalent in this simple case. 

The generalization to N^x^x gaussian correlated measurements is straightforward. Let 
us consider for instance that the two measurements xi and X2 are not independent, but 
that they have correlation p\2 < 1- To take correlations into account, one uses the same 
Eqs. (B.5) and (B.6) to generate the sample of replicas of the measurements, but this time 
the random numbers r^'^'* and are univariate gaussian correlated random numbers, that 
is, they satisfy 



rep 
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With this modification, the sample of Monte Carlo replicas of xi and X2 also reproduces the 
experimental correlations. This can be seen with the standard definition of the correlation, 

= (rir2),ep = Pi2 ■ (B.ll) 

rep 

Therefore, the Monte Carlo approach also correctly takes into account the effects of corre- 
lations between measurements. 

In realistic cases, the two procedures are equivalent only up to linearizations of the 
underlying law which describes the experimental data. We take the Monte Carlo procedure 
to be more faithful in that it docs not involve linearizing the underlying law in terms of 
the parameters. Note that as emphasized before, the error estimation technique that is 
described in this thesis does not depend on whether one uses neural networks or polynomials 
as interpolants. 



P = 
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C. Neural Network Details 

Artificial neural networks [32, 33] provide unbiased robust universal approximates to in- 
complete or noisy data. An artificial neural network consists of a set of interconnected 
units {neurons). The activation state ^^-^^ of a neuron is determined as a function of the 
activation states of the neurons connected to it. Each pair of neurons is connected 

by a synapses, characterized by a weight coij. In this work we will consider only multi- 
layer feed-forward neural networks. These neural networks are organized in ordered layers 
whose neurons only receive input from a previous layer. In this case the activation state of 
a neuron in the (l-l-l)-th layer is given by 

= 5 (/if +'^) , i = l,...,ni+u / = 1,...,L-1, (C.l) 

nil) 

where 9f^ is the activation threshold of the given neuron, ni is the number of neurons in the 
1-th layer, L is the number of layers that the neural network has, and g{x) is the activation 
function of the neuron, which we will take to be a sigma function, 

9ix) = , (C.3) 

except in the last layer, where we use a linear activation function g{x). This enhances 
the sensitivity of the neural network, avoiding the saturation of the neurons in the last 
layer. The fact that the activation function g[x) is non-linear allows the neural network to 
reproduce nontrivial functions. 

Therefore multilayer feed-forward neural networks can be viewed as functions F : 
T?."! — TZ^i- parametrized by weights, thresholds and activation functions, 

j = l...,nL. (C.4) 
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It can be proved that any continuous function, no matter how complex, can be represented 
by a multilayer feed-forward neural network. In particular, it can be shown [32, 33] that 
two hidden layers suffice for representing an arbitrary function. 

In our particular case the architecture of the neural network used is 1-5-5-1, which 
means that it has a single input neuron (whose value is the neutrino energy), two hidden 
layers with 5 neurons each and a final output neuron (whose value is the atmospheric 
neutrino flux). 

D. Genetic Algorithms 

Genetic algorithms arc the generic name of function optimization algorithms that do not 
suffer of the drawbacks that deterministic minimization strategies have when applied to 
problems with a large parameter space. This method is specially suitable for finding the 
global minima of highly nonlinear problems. 

All the power of genetic algorithms lies in the repeated application of two basic opera- 
tions: mutation and selection. The first step is to encode the information of the parameter 
space of the function we want to minimize into an ordered chain, called chromosome. 
If A'par is the size of the parameter space, then a point in this parameter space will be 
represented by a chromosome. 



In our case each bit ai of a chromosome corresponds to cither a weight or a threshold 
of^ of a neural network. Once we have the parameters of the neural network written as a 
chromosome, we replicate that chain until we have a number A'tot of chromosomes. Each 
chromosome has an associated fitness /, which is a measure of how close it is to the best 
possible chromosome (the solution of the minimization problem under consideration). In 
our case, the fitness of a chromosome is given by the inverse of the function to minimize, 
E'^ given in Eq. (2.21). 

Then we apply the two basic operations: 

• Mutation: Select randomly a bit (an element of the chromosome) and mutate it. The 
size of the mutation is called mutation rate rj, and if the k-th bit has been selected, 
the mutation is implemented as 



where r is a uniform random number between and 1. The optimal size of the 
mutation rate must be determined for each particular problem, or it can be adjusted 
dynamically as a function of the number of iterations. 

• Selection: Once mutations and crossover have been performed into the population 
of individuals characterized by chromosomes, the selection operation ensures that 
individuals with best fitness propagate into the next generation of genetic algorithms. 
Several selection operators can be used. The simplest method is to select simply the 
A^chain chromosomes, out of the total population of Atot individuals, with best fitness. 



a= {ai,a2,a3,...,aNp,,) ■ 
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The procedure is repeated iteratively until a suitable convergence criterion is satisfied. 
Each iteration of the procedure is called a generation. A general feature of genetic algo- 
rithms is that the fitness approaches the optimal value within a relatively small number of 
generations, as seen in Fig. 2 for the present problem. 
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